FlowerVLA is a vision-language-action flow model pre-trained on the CALVIN D dataset, employing an efficient flow-matching architecture that achieves general-purpose robot operation strategies with only about 1 billion parameters.
Multimodal Fusion English